WIP OCPNODE-4529: Migrate test case 44493 - configurable terminationGracePeriodSeconds for probes#31170
WIP OCPNODE-4529: Migrate test case 44493 - configurable terminationGracePeriodSeconds for probes#31170BhargaviGudi wants to merge 1 commit into
Conversation
Test validates configurable terminationGracePeriodSeconds for liveness and startup probes. Verifies that: - Liveness probes honor probe-level terminationGracePeriodSeconds (10s vs pod-level 60s) - Startup probes honor probe-level terminationGracePeriodSeconds (10s vs pod-level 60s) - Probes without probe-level setting fall back to pod-level terminationGracePeriodSeconds (60s)
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: BhargaviGudi The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughA new Ginkgo e2e test is added to validate configurable terminationGracePeriodSeconds for Kubernetes liveness and startup probes. The test parses pod event timestamps, measures probe failure timing, and verifies the observed grace periods match configured values or pod-level defaults when probe-level settings are absent. ChangesProbe Termination Grace Period E2E Test
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
test/extended/node/node_e2e/node.go (4)
282-432: ⚡ Quick winThree near-identical pod specs — extract a builder helper.
The three pods (
liveness-probe,startup-probe,liveness-probe-no-term) duplicate ~50 lines of spec each (same image, security context, command, ports). Only the probe kind, name, and probe-levelTerminationGracePeriodSecondsdiffer. A small builder would shrink the test by ~100 lines and make the differences explicit.♻️ Sketch
buildProbePod := func(name string, probe *corev1.Probe, kind string) *corev1.Pod { container := corev1.Container{ Name: "test", Image: "quay.io/openshifttest/nginx-alpine@sha256:04f316442d48ba60e3ea0b5a67eb89b0b667abf1c198a3d0056ca748736336a0", Command: []string{"bash", "-c", "sleep 100000000"}, Ports: []corev1.ContainerPort{{ContainerPort: 8080}}, SecurityContext: &corev1.SecurityContext{ AllowPrivilegeEscalation: ptr.To(false), Capabilities: &corev1.Capabilities{Drop: []corev1.Capability{"ALL"}}, }, } switch kind { case "liveness": container.LivenessProbe = probe case "startup": container.StartupProbe = probe } return &corev1.Pod{ ObjectMeta: metav1.ObjectMeta{Name: name, Namespace: namespace}, Spec: corev1.PodSpec{ TerminationGracePeriodSeconds: ptr.To[int64](60), SecurityContext: &corev1.PodSecurityContext{ RunAsNonRoot: ptr.To(true), SeccompProfile: &corev1.SeccompProfile{Type: corev1.SeccompProfileTypeRuntimeDefault}, }, Containers: []corev1.Container{container}, }, } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 282 - 432, The three pod specs (liveness-probe, startup-probe, liveness-probe-no-term) duplicate container/image/security/command/ports setup — extract a builder like buildProbePod(name string, probe *corev1.Probe, kind string) that constructs the common corev1.Container (Image, Command, Ports, SecurityContext) and attaches either LivenessProbe or StartupProbe based on kind, and returns the full *corev1.Pod with the shared PodSpec (pod-level TerminationGracePeriodSeconds and PodSecurityContext); replace the three inline pod literals with calls to buildProbePod("liveness-probe", livenessProbe, "liveness"), buildProbePod("startup-probe", startupProbe, "startup") and buildProbePod("liveness-probe-no-term", nil, "liveness") and keep verifyProbeTermination calls unchanged.
216-279: 🏗️ Heavy liftUse the Events API directly instead of parsing humanized
oc describeoutput.The helper extracts timestamps by splitting describe output and indexing
fields[2]. This is fragile:
- The "Age" column position depends on the describe template and event aggregation. While aggregated events (e.g.,
60s (x3 over 90s)) keep the last-seen value atfields[2], any format change breaks parsing silently.- The substring match
"Started container"and multi-substring match for probe failures could match unrelated events if more containers are added.Use the Events API directly: query
oc.KubeClient().CoreV1().Events(namespace).List(...), filter byinvolvedObjectandreason(Started,Killing), and consumeevent.FirstTimestamp/event.LastTimestampdirectly without parsing humanized durations. This pattern is idiomatic across the test suite.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 216 - 279, The current verifyProbeTermination function is brittle because it parses humanized `oc describe` output and indexes fields[2]; replace that logic with a direct Events API query: inside verifyProbeTermination (and remove dependence on parseDurationToSeconds), call oc.KubeClient().CoreV1().Events(namespace).List(...) (or use the client wrapper available in the test helpers), filter events by event.InvolvedObject.Name == podName and by event.Reason == "Started" for container starts and event.Reason == "Killing" (or the probe-failure reason used by your cluster) for probe/termination events, then use the event.FirstTimestamp/LastTimestamp fields to compute seconds difference and compare to expectedTerminationSec with the same tolerance logic; log the selected event timestamps and keep the polling/wait.PollUntilContextTimeout wrapper and return true when the time diff is within range.
185-213: ⚡ Quick winReplace custom duration parser with
time.ParseDuration.The Go standard library already parses the exact formats kubectl emits (
"45s","1m30s","1h2m3s"). The current implementation silently returns0, nilfor unrecognized inputs (e.g.,"5h"since it matches neither the"m"nor"s"branches), which would cause downstream timing arithmetic to be wrong without surfacing an error. Switching totime.ParseDurationremoves the custom code path and the silent-zero failure mode.♻️ Proposed refactor
- // Helper function to parse duration string like "1m30s" or "45s" to seconds - parseDurationToSeconds := func(durationStr string) (int, error) { - var totalSeconds int - if strings.Contains(durationStr, "m") { - parts := strings.Split(durationStr, "m") - minutes, err := strconv.Atoi(parts[0]) - if err != nil { - return 0, err - } - totalSeconds = minutes * 60 - if len(parts) > 1 && strings.Contains(parts[1], "s") { - secStr := strings.TrimSuffix(parts[1], "s") - if secStr != "" { - seconds, err := strconv.Atoi(secStr) - if err != nil { - return 0, err - } - totalSeconds += seconds - } - } - } else if strings.Contains(durationStr, "s") { - secStr := strings.TrimSuffix(durationStr, "s") - seconds, err := strconv.Atoi(secStr) - if err != nil { - return 0, err - } - totalSeconds = seconds - } - return totalSeconds, nil - } + // Helper function to parse duration string like "1m30s" or "45s" to seconds + parseDurationToSeconds := func(durationStr string) (int, error) { + d, err := time.ParseDuration(durationStr) + if err != nil { + return 0, err + } + return int(d.Seconds()), nil + }The
strconvimport can then be removed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 185 - 213, The custom parser function parseDurationToSeconds should be replaced to use time.ParseDuration: call time.ParseDuration(durationStr), return int(duration.Seconds()) on success and propagate the error on failure so unrecognized inputs (e.g., "5h") don't silently return 0; update the function signature/returns accordingly, remove the now-unused strconv import, and ensure callers still get an int number of seconds from the parsed duration.
288-291: 💤 Low valueUse a pointer helper instead of
&[]T{v}[0].The
&[]int64{60}[0]/&[]bool{true}[0]pattern (used 11 times in this test) allocates a single-element slice solely to take its address. Preferptr.Tofromk8s.io/utils/ptrfor clarity and consistency with the rest of the k8s ecosystem.♻️ Example refactor
import ( ... + "k8s.io/utils/ptr" ... )- TerminationGracePeriodSeconds: &[]int64{60}[0], + TerminationGracePeriodSeconds: ptr.To[int64](60), SecurityContext: &corev1.PodSecurityContext{ - RunAsNonRoot: &[]bool{true}[0], + RunAsNonRoot: ptr.To(true), ... }, ... - AllowPrivilegeEscalation: &[]bool{false}[0], + AllowPrivilegeEscalation: ptr.To(false), ... - TerminationGracePeriodSeconds: &[]int64{10}[0], + TerminationGracePeriodSeconds: ptr.To[int64](10),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/node.go` around lines 288 - 291, Replace the hacky &[]T{v}[0] pointer constructions with the ptr.To helper from k8s.io/utils/ptr for clarity and consistency: e.g. change TerminationGracePeriodSeconds: &[]int64{60}[0] to use ptr.To(int64(60)) and RunAsNonRoot: &[]bool{true}[0] to ptr.To(true); update all other similar occurrences (about 11 spots) in this file (look for TerminationGracePeriodSeconds, RunAsNonRoot, SeccompProfile usages) and add the import for "k8s.io/utils/ptr" if not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Line 170: Update the test title string used in g.It to reference the correct
Kubernetes field name "terminationGracePeriodSeconds" (currently
"terminationGracePeriod") and ensure any Polarion title/identifier used in the
same test block is updated to match; locate the g.It invocation (the test named
"[OTP] add configurable terminationGracePeriod to liveness and startup probes
[OCP-44493]") and change the human-readable title and Polarion metadata to use
"terminationGracePeriodSeconds" so the test name matches the PR description and
API field.
---
Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 282-432: The three pod specs (liveness-probe, startup-probe,
liveness-probe-no-term) duplicate container/image/security/command/ports setup —
extract a builder like buildProbePod(name string, probe *corev1.Probe, kind
string) that constructs the common corev1.Container (Image, Command, Ports,
SecurityContext) and attaches either LivenessProbe or StartupProbe based on
kind, and returns the full *corev1.Pod with the shared PodSpec (pod-level
TerminationGracePeriodSeconds and PodSecurityContext); replace the three inline
pod literals with calls to buildProbePod("liveness-probe", livenessProbe,
"liveness"), buildProbePod("startup-probe", startupProbe, "startup") and
buildProbePod("liveness-probe-no-term", nil, "liveness") and keep
verifyProbeTermination calls unchanged.
- Around line 216-279: The current verifyProbeTermination function is brittle
because it parses humanized `oc describe` output and indexes fields[2]; replace
that logic with a direct Events API query: inside verifyProbeTermination (and
remove dependence on parseDurationToSeconds), call
oc.KubeClient().CoreV1().Events(namespace).List(...) (or use the client wrapper
available in the test helpers), filter events by event.InvolvedObject.Name ==
podName and by event.Reason == "Started" for container starts and event.Reason
== "Killing" (or the probe-failure reason used by your cluster) for
probe/termination events, then use the event.FirstTimestamp/LastTimestamp fields
to compute seconds difference and compare to expectedTerminationSec with the
same tolerance logic; log the selected event timestamps and keep the
polling/wait.PollUntilContextTimeout wrapper and return true when the time diff
is within range.
- Around line 185-213: The custom parser function parseDurationToSeconds should
be replaced to use time.ParseDuration: call time.ParseDuration(durationStr),
return int(duration.Seconds()) on success and propagate the error on failure so
unrecognized inputs (e.g., "5h") don't silently return 0; update the function
signature/returns accordingly, remove the now-unused strconv import, and ensure
callers still get an int number of seconds from the parsed duration.
- Around line 288-291: Replace the hacky &[]T{v}[0] pointer constructions with
the ptr.To helper from k8s.io/utils/ptr for clarity and consistency: e.g. change
TerminationGracePeriodSeconds: &[]int64{60}[0] to use ptr.To(int64(60)) and
RunAsNonRoot: &[]bool{true}[0] to ptr.To(true); update all other similar
occurrences (about 11 spots) in this file (look for
TerminationGracePeriodSeconds, RunAsNonRoot, SeccompProfile usages) and add the
import for "k8s.io/utils/ptr" if not already present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3b8a78f5-42e4-4bc0-921b-438671a54f52
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
| //author: minmli@redhat.com | ||
| //migrated from openshift-tests-private | ||
| //automates: https://issues.redhat.com/browse/OCPBUGS-44493 | ||
| g.It("[OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493]", ote.Informing(), func() { |
There was a problem hiding this comment.
Test name doesn't match the run command in the PR description.
Line 170 uses terminationGracePeriod (no “Seconds”), but the PR description's test command targets terminationGracePeriodSeconds. The Kubernetes API field is terminationGracePeriodSeconds; align the title to match (and ensure the Polarion title matches as well).
📝 Proposed fix
- g.It("[OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493]", ote.Informing(), func() {
+ g.It("[OTP] add configurable terminationGracePeriodSeconds to liveness and startup probes [OCP-44493]", ote.Informing(), func() {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| g.It("[OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493]", ote.Informing(), func() { | |
| g.It("[OTP] add configurable terminationGracePeriodSeconds to liveness and startup probes [OCP-44493]", ote.Informing(), func() { |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/node/node_e2e/node.go` at line 170, Update the test title
string used in g.It to reference the correct Kubernetes field name
"terminationGracePeriodSeconds" (currently "terminationGracePeriod") and ensure
any Polarion title/identifier used in the same test block is updated to match;
locate the g.It invocation (the test named "[OTP] add configurable
terminationGracePeriod to liveness and startup probes [OCP-44493]") and change
the human-readable title and Polarion metadata to use
"terminationGracePeriodSeconds" so the test name matches the PR description and
API field.
|
Scheduling required tests: |
|
@BhargaviGudi: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: b783cc7
New tests seen in this PR at sha: b783cc7
|
|
@BhargaviGudi: This pull request references OCPNODE-4529 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
Migrates test case OCP-44493 from openshift-tests-private to origin.
Validates that Kubernetes liveness and startup probes honor their probe-level
terminationGracePeriodSecondssetting instead of defaulting to the pod-level value.Polarion
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-44493
Bug
OCPBUGS-44493
Test Coverage
Implementation
Testing
```bash
./openshift-tests run-test "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriodSeconds to liveness and startup probes [OCP-44493]"
```
Summary by CodeRabbit